We want to study the notion of integration for a generic measurable function on . Here is general in terms of
takes value in , but can be unbounded.
The domain , on which is defined, is measurable but can have infinite measure
As in Royden’s book, we will establish the integration of a general case step by step:
Define Lebesgue integration for simple functions on any measurable set
Define Lebesgue integration for bounded functions on with finite measure
Define Lebesgue integration for non-negative functions on with any measure (can be infinite)
Define Lebesuge integration for general cases
For each scenario, we will:
establish linearity and monoticity
establish additivity of integral over domains
study convergence: under what conditions we can exchange and , i.e. if in some sense, when can we have
Also note that in general cases, the integral can be . Hence we are interested in cases where . We would call this case as “ is integrable over E“.
Lebesgue Integration for simple functions
In this note, we may skip the proofs of some simple lemmas and theorems.
Let be a simple functions on with finite measure, and express as , where takes the value in the measurable subset .
Then the Lebesgue integral of over is defined as
Theorem 1 (Linearity): Let be simple functions on with finite measure. For any , Proof: It suffices to prove (a) , and (b) .
For (a), notice that . Therefore, . This proves (a).
For (b), assume and . Let and . Then let . It is easily shown that only takes values in . For , define . It is easily shown that is measurable, and for any , and are disjoint. Thus, can be rewritten as $$ \varphi(x) = \sum_{k=1}^K a’k \cdot I{x \in E_k} \psi(x) = \sum_{k=1}^K b’k \cdot I{x \in E_k} \int_E (\varphi + \psi) = \sum_{k=1}^{K} (a’_k + b’m) \cdot m(E_k) = \sum{k=1}^{K} [a’k m(E_k)] + \sum{k=1}^{K} [b’_km(E_k)] = \int_E \varphi + \int_E \psi $$
Theorem 2 (Monotonicity): Let be simple functions on . If on , then .
Proof: Since , . By linearity in Theorem 1, .
is bounded and has finite measure
Let be a bounded function defined on a measurable set with finite measure. is bounded, i.e. there is some s.t. on .
We define its upper Lebesgue integral over as We will often abbreviate the above to . We defined its lower Lebesgue integral over as Similary, we abbreviate the above to
You can easilly prove that
Lemma 3:
We say the Lebesgue integral of exists if . We write its integral as and have
Theorem 4: Let be a bounded function over a measurable set with finite measure. If is measurable, then is integrable over , i.e. exists.
Proof: By Simple Approximation Lemma (Lemma 3 in Note LIR3), for any there exist simple functions s.t. for all , and . Hence, . The above inequality holds for arbitrarily large . Therefore, we must have .
Theorem 5 (Linearity): Let be bounded measurable functions on with a finite measure. For any , .
Proof: Since are bounded and measurable, is also bounded and measurable (Theorem 7 in Note LIR2). Hence is integrable over by Theorem 4.
First we prove . If , observe that , and . Since , we have , which leads to .
If , it suffices to prove . By similar arguements, we can see that . The left hand side of (1) equals to . By (1) and (2) we obtain . This completes the proof of for any . (The case is trivial.)
Next, we proceed to prove . Observe, for arbitrary simple functions and , . By arbitrariness of , we must have . Similarly we have . By (3) and (4) we establish .
Corollary 6 (Monotonicity): Let be bounded measurable functions on with a finite measure. If on all of , then .
Proof: Observe . Then is a non-negative measurable function on . Then where on . Thus .
Corollary 7: If is bounded and integrable over a finite measure , then
Proof: is bounded and integrable over . Observe . So by linearity and monotonicity, we have $$
\int_E |f| \leq \int_E f \leq \int_E |f| $$ , which leads to the desired result.
Corollary 8 (Additivity of Integral over Domains): Let be a bounded and measurable function on a set of finite measure . Let be a measurable subset. Then
Proof: Define function as on and on . Similarly, define on and on . Then it is easily shown that are both measurable on . Observe , and . So by linearity we have .
Corollary 10: Let be a bounded and measurable function on a set of finite measure . If , then .
Proof: For any , its integral .
Theorem 9 (Uniformly Convergent Sequence): Let be a sequence of bounded and measurable functions on a set of finite measure . If uniformly, then
Proof: For any , since uniformly, there exists some s.t. . So .
If pointwise a.e. on , then we need some extra conditions to bound .
Theorem 10: (Pointwise Convergent and Uniformly Boundeded Sequence): Let be a sequence of bounded and measurable functions on a set of finite measure . If pointwise a.e. on and in addition is uniformly bounded, i.e. there is some s.t. on for all , then
Proof:
First by Theorem 1 in Note LIR 3, is measurable on and thus integrable over .
Let be the set that does not converge to , and . Note . If we were able to establish the result for , then , since by Corollary 10, and are all .
Hence we may assume pointwise on . It is easily shown that . So By Egoroff’s Theorem, for any , there is a and a closed set s.t. and whenever . Therefore, .
is non-negative and E has infinite measure
For a measurable function defined on a measurable set , we say has a finite support , if for , for all , and . We say vanishes outside .
If has a finite support of and is bounded in , then we define . The latter is well defined in the last section.
To define when and can have infinite measure, we use a set of non-negative bounded functions that have finite support to “approach” it. Formally,
When , we say is integrable over .
Theorem 11 (Equivalent Definition): If is a non-negative measurable function defined on a measurable set , then
Proof: Let , and let . Clearly , and thus . To prove the converse, we need to show for every , there exists some s.t. .
Note by Simple Approximation Theorem, there is a sequence in s.t. pointwise a.e. on and for all . Without loss of generality, we have pointwise on , where is the finite support of .
If is bounded on , then is uniformly bounded on . So
If is unbounded on , then let and . So for .
Since is bounded, is uniformly bounded on . Hence
On the other hand, is bounded on hence . So there is some such that and
Note is increasing and . By (1) and (2) we can choose sufficiently large such that .
Theorem 12 (Lineraity and Monotonicity): Let be non-negative measurable functions over a measurable set . For any ,
If for all , then
Proof: First let’s prove . For short, let’s write and . We want to show . By linearity of integral of bounded functions, for any we take . Then . So we can easily show .
Conversely, for any s.t. , we take . Then . Thus . So we have established .
Next we prove . Consider any with finite support , and any with finite support . Take . Then vanishes outside and we can easily show that . Clearly on . Hence we have . Conversely, consider a with finite support . Let s.t. on . Let . So on . Now take and on and on . So on and on . Also on . Thus, . This proves .
By (1) and (2) we prove the linearity of integration.
Finally, assume . Then any would satisfy . By definition of the integral, the supremum of left-hand-side must be less than or equal to the supremum of the right-hand-side. Hence .
Corollary 13 (Additivity of Integral over Domains):
Let be a sequence of functions defined on a measurable set . There are some different types of convergence.
Converge uniformly: we say converges to uniformly on , denoted by , if for any there is a s.t. whenever , for any .
Converge point wise: we say converges to pointwise, denoted by , if for any , . We say converges to pointwise almost everywhere, denoted by , if on and
Converge in measure: we say converges to in measure, denoted by , if for any and there is a such that whenever we have
The notion of “convergence in measure” is that the points () whose do not converge to is getting smaller and smaller. But this does not gaurantee that every point will converge to as in “converge pointwise” or even in “converge pointwise almost everywhere”. In probability theory, “converge in measure” is also known as “converge in probability”, and “converge pointwise (almost everywhere)” is also known as “converge almost surely”.
In probability theory, “converge pointwise (almost everywhere)” can also be denoted by , while “converge in probability” can be denoted by the statement that for ,
“Converge uniformly” is the strongest version of convergence among the three, as it only requires convergence for each point but also requires they “converge at the same speed”, i.e. after a certain threshold , all must be very close () to .
Now we see how convergence preserves measurability.
Theorem 1: is a sequence of measurable functions on . If pointwise a.e., then is measurable on .
Proof: For any , let’s consider the set . Observe , which is measurable.
Approximation
, defined on a measurable set , is said to be simple if it only takes a finite number of values and each is measurable. So we often write as , where is an indicator function, i.e. if is true and 0 otherwise.
For a function, we can use a sequence of simple functions to approximate it. Formally, we have the following theorem.
Theorem 2 (Simple Approximation Theorem): Let be a function defined on . Then is measurable, if and only if, there is a sequence of simple functions such that for any and pointwise. If is non-negative, we may choose as increasing.
To prove Theorem 2, we first prove a lemma.
Lemma 3 (Simple Approximation Lemma): Let be a bounded measurable function on , i.e. there is an such that for any . Then for any , there are simple functions such that , and
Proof: Since is bounded by , for partition the interval to , where , and for each , .
Let . Then is measurable. Define and .
Proof of Theorem 2: By Theorem 1, if the sequence of simple functions converges to pointwise, then is measurable. It remains to prove the converse.
Assume is non-negative. Note can be unbounded. Let and
(for some Markdown to show correctly, we have if and if ).
So by Lemma 3, there are defined on such that when for any and . We claim that is the increasing sequence of simple functions as desired.
To see this, consider any . Note is increasing with . So there is a such that whenever . For any there is an such that . So whenever . Take . So for , there exists such that when , we have (1) and (2) . Hence pointwise. This proves the cases where is non-negative.
Assume is non-positive, then is non-negative. By above arguments, there is a sequence s.t. and point wise. So we take and it is easily shown that it meets the requirement.
Finally, let be arbitrary. Then . The corresponding sequences are . Take . It is easily shown that pointwise and .
Egoroff’s Theorem
Egoroff’s theorem is another example of Littlewood’s principles, i.e. every pointwise convergence is “almost” uniform convergence.
Formally,
Theorem 4 (Egoroff’s Theorem): Let be a measurable set with finite measure. Let be a sequence of measurable functions on . pointwise on . Then for any there is a closed set and such that uniformly.
Again, we prove a lemma first.
Lemma 5: Let satisfy the conditions stated in Egoroff’s Theorem, then for any there exists such that and for any whenever .
Proof: Let . Since pointwise, for any there is some such that whenever . Now we “group” these into different sets as follows.
Let . Then is increasing and . So by continuity of Lebesgue measure, we have . Since , there is some such that . Let be , and the constructed are what we desire.
Proof of Egoroff’s Theorem: For any , by Lemma5, take and . Then there are some and s.t. and , for any whenever .
Note is measurable. So take and is also a measurable set. Observe that uniformly on , and . Since is measurable, by inner approximation (Theorem 11 in note LIR1), there is a closed set s.t. . Therefore uniformly on the closed set and .
Let be a measurable set and let a extended-value function be defined on , i.e. .
We say is a measurable function if its inverse image of any Borel set is measurable. Formally,
Theorem 1: The following statements are equivalent. And if holds any of the following property, we say is a measurable function.
For any , the set is measurable.
For any , the set is measurable.
For any , the set is measurable.
For any , the set is measurable.
Proof: Assume Item 1 holds. Let . So . Each is measurable, so Item 2 holds.
Assume Item 2 holds. The set in Item 3 is a complement of the set in Item 2. Hence Item 3 holds.
Assume Item 3 holds. By taking similar approach in proving Item 1 implies Item 2, we obtain the fact that Item 4 holds.
Assume Item 4 holds, the set in Item 1 is a complement of the set in Item 4. So Item 1 holds.
Corollary 2: If is a measurable function defined on the measurable set , then for any , the set is measurable.
Proof: If , note . The two sets on the right hand side are measurable, so is the one on left hand side.
If , consider . Then , and it is measurable. Similarly, the conclusion holds when .
Corollary 3: is a measurable function defined on the measurable set , if and only if, for any open set , is measurable.
Proof: For any , is an open set. So is measurable. Hence is a measurable function.
Coversely, suppose is measurable and consider an open set . It can be expressed as a union of disjoint open intervals, i.e. . Let and . Then and are both measurable sets, and is also measurable.
So is measurable.
Theorem 4 (Continuity implies measurability): Let be a continuous function on a measurable set . Then is measurable on .
Proof: If is continuous, for any open set we have where is open. Thus is measurable. By Corollary 3, is measurable.
Theorem 5: Let be a monotonic function on an interval . Then is measurable.
Proof: Assume is increasing in .
Suppose is an open interval where . For any , if then the set is measurable. If , the set is measurable. If , the set is measurable.
Suppose is a left-closed-right-open interval , where . For any , if then is measurable. If , is measurable. If , it is an empty set.
Similar statements can be applied to the cases and .
If is decreasing, simiarly we can show that it is also measurable.
Next, we wonder if is measurable, and functions have certain relationships with , how about their measurability?
Theorem 6: is a function on .
If is measurable and almost everywhere on , then is measurable on .
Let be a measurable subset of . Then is measurable on if and only if is measurable on both and .
Proof: For statement (1), let be the set that and on . Then for any , the set , which is measurable. This proves statement (1).
For statement (2), for any let and . If is measurable on then is measurable. Thus both and are measurable, which implies is measurable on both and . Conversely, if is measurable on and , the sets and are measurable. So is measurable.
We shall now only consider real-valued functions, i.e. , from a practical point of view and to simply a lot of proofs.
Theorem 7: are measurable functions on .
For any and , is measurable on .
is measurable on .
Proof: For statement (1), it suffices to prove (1a) is measurable, and (1b) is measurable.
For any , the set is measurable. This proves (1a).
Now consider the set . Then . Note the set of rationals is dense in . So there is a s.t. . Thus
Each component under the union is measurable and is countable. So the set on the left hand side of (1) is also measurable. This proves (1b).
For statement (2), we consider . Since we have establish statement (1), it remains to prove is measurable.
Consider the set . If , is measurable. , , which is also measurable.
Now we consider composition.
Theorem 8: Let is a measurable function on and is continuous on . Then the composition is measurable on .
Proof: Consider any open set . Then. . Since is continuous on , is open. Thus by Corollary 3, is measurable. Again by Corollary 3, is a measurable function.
Theorem 9: and are measurable on . Then
is measurable.
is measurable.
is measurable.
is measurable.
is measurable.
Proof: The set is measurable. This proves statement (1).
Similarly, the set is measurable. This proves statement (2).
The set if , and if . In either case, is measurable. This proves statement (3).
The set if and if . In either case, is measurable. This proves statement (4).
For statement (5), note . By (3), (4) and Theorem 8 (linearity preserves measurability), is measurable.
A CFG is said to be in Greibach Normal Form (GNF), if all its rules are of the following forms:
where, are nonterminals, and is some terminal and .
And if , then we can include
, and does not appear on the right hand side of any rule
Note in GNF, no left recursion can happen. This will be helpful for some top-down parsers.
To show that any CFG can be converted to GNF, we will construct an elegant mapping with the help of least fixed point property.
Least Fixed Point
Let be a partially order set. We say a sequence is a -chain if for . Every -chain has a least upper bound, denoted by
We say is complete poset iff it has a least element and every -chain has a least upper bound in .
Now let and be complete posets. A function is said to be -continuous iff
it is monotonic, i.e. if , then
, where
Theorem 1 (Least Fixed Point): Let be a complete poset. Then every -continous function has a least fixed point.
Proof: Let be the least element. We will use and .
The trick is to construct a -chain as follows:
To show it is -chain, note that is -continuous hence it is monotonic. Since is the least element, we have . Since is monotonic, we have . Continue this by induction on , we prove the chain is a -chain.
Since is complete poset, the chain has a least upper bound in , denoted by
is the least fixed point we are looking for. To see this, recall is -continuous. Thus , which shows is a fixed point. For any fixed point , since we have for any . Thus . This proves is the least fixed point.
Derivations at “Equilibrium”
For a CFG , let be its nonterminals. We can rearrange its production rules and group by nonterminals as:
…
, where and “” means “or”. For example, if we have two rules and , we can write . This notation simplicity later turns out to be an elegant way to represent CFG derivation by mimicing matrix multiplicaton and addition.
If we start derivating from some nonterminal, say , what would look like? and how its elements are added to the set gradually?
At step 0 where no nonterminal is rewritten, can only include some terminal strings if they don’t contain any nonterminals. The same goes to . So “after step 0”, the language dervied from contains some initial strings only. Let’s move 1 step further and look at again. Now the nonterminals in its start to “unfold”. For example, if in some , then it will unfold to all possible strings in in step 0. The same goes to all the other nonterminals.
So at each step, the language grows bigger and bigger. You may feel that there will some “limit” for how large can go, and will be some “equilibrium” so that when step , all “stop” at deriving new strings. At infty, the final will be some sort of “fixed point”.
The above idea is exactly what we will construct next and utilize the least fixed point theorem.
Formally let’s give more notation and definition.
Let be the set of symbols and be the set of terminals. We use to denote the set of nonterminals. Note a language is a finite subset of strings in . Similarly, we use letter to denote a finite subset of nonterminals and terminals in , i.e.
For any tuple of languages we define a function as follows:
, for all
, where is a nonterminal
, where and the dot operation means concatenating every pair of strings from the two set. Sometimes we omit the dot sign for simplicity.
, where and the plus operation means “or”, i.e. we union the 2 sets
You can easily verify that is a well defined function. Now we use it to define a mapping . Assume the rules in grammar are written in the form of (1) above, and let
The definition in (2) is a formal way to specify what we have discusseed. Given a tuple of languages we want to use it to expand all nonterminals in the rules and thus obtain a new set of languages .
So if we start with and apply , we will obtain , where each is the language derived from after 1 step. We keep doing this and eventually obtain . As you can guess, is the language derived from start symbol , i.e. , and they are exactly the least fixed point of .
We shall prove three lemmas, which will lead to our desired result in a theorem.
Note for any set , is a relation defined on as iff . So in our case, a paritially order relation is defined as iff , and iff for every we have .
Lemma 2: is a -continuous function.
Proof: Let and . Suppose , we want to show . Now consider their i-th components: and .
It suffices to show that for every , we have
Let , where and for . Note means this is a -rule. Then clearly (1) holds. If , then , for . If is a terminal, then . If is a nonterminal , then . By assumption, we have . Thus, we conclude that (1) holds and is monotonic.
Next, consider any -chain . Let . We need to show . This is to show for every , their i-th components are equal. Let be the i-th component of and be the i-th component of . Then we need to show and .
First, consider . There must be some such that . If we insepct closely, the set is obtained by rewriting each nonterminal in with the strings in their corresponding languages in , and is in this set. Since is increasing, there must be some such that . This shows .
Conversely, if , there is some and some such that . Then . Hence . This completes the proof.
By Theorem 1 (Least fixed point) and Lemma 2, has a least fixed point. It is constructed as
Note is the least element. Thus the least fixed point of is
Now we want to prove for all . This is establised by the following two lemmas.
Lemma 3: for all .
Proof: We prove by induction on . For , note if and only if is a production rule. Thus, can be derived from in one step and we clearly have .
Assume it holds for for every . Consider . Note . So for some . We expand , where is a terminal and is a nonterminal. By induction assumption, for each , if (with a bit abuse of notation, we let to denote
if ), then , i.e. derives in step. Thus can derive in steps, and can derive in steps. Hence .
Lemma 4:, for all .
Proof: We prove by induction on derivation length . If , then there must be a production rule . Hence . Now assume it holds for derivation length for all . Again, there must be some rule from which is reached. So in the derivation path there must exist a sub path for each (Note we can always turn a derivation path to an equivalent leftmost derivation) such that and eventaully . Note the length of such a sub path is less than . Thus by induction assumption, each for some . Now take . Then we have .
Finally with all discussions above, we arrive at
Theorem 5: is the least fixed point of .
Convert to Weak Greiback Normal Form
It is not difficult to see that if we are able to achieve , where is a terminal and can be any string, then we will be able to further convert it to the strong Greibach Normal Form mentioned at the begining of this note. The above form is called Weak Normal Form. As long as the first symbol is a terminal, we eliminate the possibility of left recursion. To achieve this, we need to introduce more “notation tricks” to represent production rules as matrix operations.
For a CFG , as above we let and assume contains nonterminals . We can group productions by their left-hand-side nonterminals as
Furthermore, on the right hand side, we can group by their first symbol, i.e.
If we write it in “matrix” form, (1) will become ,where is a row vector , is a row vector and each of its entries is a(possibly empty) finite subset of . Note if a is an empty set, then . Finally, is also a row vector where its entry is a string (possibly ) that starts with a terminal symbol.
So the set of production rules can be expressed as a matrix operation as: . We normally use to represent variables, so it becomes
Langauges over or form a semiring with
Addition : the union operations, i.e.
Multiplication :
Addition identiy : this is the emptyset
Multiplication identity : this is the
For a matrix , we let
We look at (3) again and in fact we can let . Theorem 5 says that is the least fixed point.
You can also find out that is actually derived by applying infinite number of times. Be careful that in (4), and can also contain nonterimnals . Thus when we apply a , we will write .
Now if we assume and do not contain any nonterminal, then the least fixed point of can be found by following the procedure in Theorem 5. First applying by once we have . Applying twice yield Applying it times yield . Thus, when we get the least fixed point as
Lemma 6: The fixed point of a gramma , whose production rules are expressed as where do not contain any nonterminal, has the least fixed point as .
Proof: It has been shown above.
Now the magic to turn any to a weak Greiback form is to create new nonterminals and rewrite the production rules from to . Note in above is a matrix.
We call this new grammar as . We let to denote the least fixed point of , and to denote the least fixed point of (where corresponds to the component, and to ). The elegance is that and hence and produces the same language. After this operation, rules in all start with terminal (we may assume no -rule or chain rule exists by first converting to CNF). To further make rules in to all start with terminal symbol, notice that only contain nonterminals in . So if is the fixed point, we simply let , and thus would meet our needs.
Now it remains to prove that .
Theorem 7: Let be expressed as
and be expressed as
If is the least point of and is the least fixed point of , then we have .
Proof: Since is the least fixed point of , we have . Because contains no nonterminals, by Lemma 6 the least fixed point of is . Therefore, (since by (1) is also a solution to is ).
Note the functions are both monotomic, by (2) we have
In the least fixed point theorem, for any with we have . So if we let , then (3) is showing with . Therefore, . Combining (2) and (4) we get
Similarly for , since is the least fixed point we have
Again using the trick of Lemma 6 and the fact that does not contain any nonterminal, we have the least fixed point of is . But (6) exactly states that is also a solution. So we have
Now notice . Again we have and and . Since is the least fixed point of , we have . Combining (7) and (8) we have
To proceed, observe that is a solution to . In fact, is true because by (5) we have . Also is an identity. Since is the least fixed point of , we must have .
On the other hand, using first half of (6) and result of (9) we get . We can see that is a solution to by (10). In fact,
. Since is the least fixed point of , we must have . This completes the proof.
A CFG is said to be in Chomsky Norma Form, if all its productions are only of the following forms:
, where are non-terminals
, where is a terminal and
if , and there never occurs on the right hand side of any production rule
The above is saying you either produce 2 non-terminals, or produce 1 terminal. And there will be no “-rules” and no “chain rules”. The only possible -production is directly from the start symbol.
We can build an algorithm to convert any CFG to an equivalent in CNF form. The idea is to do it at 3 steps: (1) eliminate -rules; (2) eliminate chain rules; (3) restrict all right hand sides to either 2 non-terminals or 1 terminal
Eliminate -rules
For a non terminal , if there is some chain that evetually leads to then we call “erasible”. If we eliminate all rules, then we need to effectively elinate all “erasible chains” as well.
Formally, for a CFG we use to denote the set of erasible non terminals. We construct inductively as follows:
We can see . We then let
Since only has finitely many non terminals, there must be a max such that . We can formally prove contains all easible non terminals. The following theorem gives a procedure to formally remove all -rules while letting the new grammar generate the same language.
Theorem 1 (Eliminate -rules): For any CFG , there is an equivalent such that
There are no rules of the form ,
except that if the only possible such rule is and does not appear on the right hand side of any rule
Proof: If , we create a new non-terminal and add to the rules. This make sure that will never occur on the right hand side of any rule.
We proceed to eliminate all -rules. So in the new rule set, we have as its subset as .
Now we need to add some new rules to maintain “erasibility” of original symbols. This is done by constructing a set of new rules as
So the new rule set would be if or otherwise. This completes the proof.
Note in above, some can contains erasible non terminals. You should convince yourself that the construction of indeed includes all possible new rules.
Eliminate Chain Rules
Given any CFG , with Theorem 1 we obtain an equivalent with no -rules. The next step is to eliminate chain rules.
Here we construct inductive, for a non-terminal , its set of chainable non-terminals as
So . There is a max such that .
Theorem 2 (Eliminate Chain Rules): For any CFG with no -rules, there is an equivalent such that: except the possible , all rules satisfie
If contains some non terminals, then
Proof: This is to say we first eliminate all unit rules where right hand side is a non terminal . Then we need to add back some rules to maintain the equivalence.
For a non terminal , consider its . For each , if there are some where either or , then we add a new rule .
Final Transformation
After eliminating -rules and chain rules, all the rules are of
if then and does not appear on the right hand side of any rule
, where , or or .
Theorem 3: For any CFG , there is an equivalent in Chomsky Normal Form.
Proof: We assume has no -rules and no chain rules. First we deal with terminal productions. For a rule , if is a terminal we retain it. If are all terminals but has length , i.e. , then we do
create new non terminals $P_1,…,P_{n-1}, P_n, P’1,…,P’{n-1}$,
and create following new rules:
…
$P’{n-1} \rightarrow a{n-1}$
Now if in we have contains some non terminals, then we do
assume , where are non terminals and are all terminal strings
create new non terminals and add rules . For each such rule, repeat the above process to transform into CNF form. (if is empty, then ignore it)
consider rule . We create new non terminals and new rules:
…
CYK Parser
For a CFG in Chomsky Normal Form, a CYK parser can both regconize and parse any string . Its worst case takes where . In practice, almost no is designed in CNF. And even we can transform it to CNF, a CYK parser can only parse it to CNF tree but not its original parse tree. Thus, CNF and CYK parser are not popular in practice. We will not proceed to note any details here.